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The use of whole-genome sequencing as a tool for the study of infectious bacteria is of growing clinical interest. Chlamydia 
trachomatis is responsible for sexually transmitted infections and the blinding disease trachoma, which affect hundreds of 
millions of people worldwide. Recombination is widespread within the genome of C trachomatis, thus whole-genome se- 
quencing is necessary to understand the evolution, diversity, and epidemiology of this pathogen. Culture of C. trachomatis 
has, until now, been a prerequisite to obtain DNA for whole-genome sequencing; however, as C. trachomatis is an obligate 
intracellular pathogen, this procedure is technically demanding and time consuming. Discarded clinical samples represent 
a large resource for sequencing the genomes of pathogens, yet clinical swabs frequently contain very low levels of C. 
trachomatis DNA and large amounts of contaminating microbial and human DNA. To determine whether it is possible to 
obtain whole-genome sequences from bacteria without the need for culture, we have devised an approach that combines 
immunomagnetic separation (IMS) for targeted bacterial enrichment with multiple displacement amplification (MDA) for 
whole-genome amplification. Using IMS-MDA in conjunction with high-throughput multiplexed lllumina sequencing, we 
have produced the first whole bacterial genome sequences direct from clinical samples. We also show that this method can 
be used to generate genome data from nonviable archived samples. This method will prove a useful tool in answering 
questions relating to the biology of many difficult-to-culture or fastidious bacteria of clinical concern. 



[Supplemental material is available for this article.] 

Chlamydia trachomatis is a pathogen of global importance as the 
most common bacterial sexually transmitted infection (STI) (WHO 
2011), and is also responsible for trachoma, the leading cause of 
infectious blindness worldwide (Mariotti et al. 2009; WHO 2012). 
Urogenital chlamydial infections usually manifest as urethritis 
and cervicitis, but are often asymptomatic and can result in severe 
complications and sequelae such as pelvic inflammatory disease, 
tubal damage, and infertility if untreated. In addition, some 
C. trachomatis infections are invasive, causing the disease lympho- 
granuloma venereum (LGV) (Burgoyne 1990). 
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C. trachomatis is an obligate intracellular pathogen with 
a specialized biphasic developmental cycle. The infectious ele- 
mentary bodies (EBs) are taken up by the host cell into a cytoplas- 
mic vacuole called an inclusion, where they differentiate into the 
actively replicating form, known as reticulate bodies (RBs). The 
developmental cycle is completed when RBs differentiate back into 
metabolically inert EB particles and are released from the host cell 
by lysis. Thus C. trachomatis requires tissue culture for in vitro 
growth, a technique which is technically challenging and time 
consuming. While cell culture used to be the "gold standard" for 
the laboratory diagnosis of C. trachomatis infections, this has been 
superseded by much more rapid and sensitive nucleic acid ampli- 
fication tests (NAATs) (for review, see Skidmore et al. 2006). 

For epidemiological surveillance, C. trachomatis strains have 
traditionally been classified into serovars based on the major outer 
membrane protein (MOMP), which represents the major surface 
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antigen (Stephens et al. 1982; Wang et al. 1985). Currently ompA 
genotyping, based on the gene encoding MOMP, is more com- 
monly performed (for review, see Pedersen et al. 2009), with ompA 
genotypes A-C associated with trachoma, D-K with urogenital 
infections, and L1-L3 with LGV. Recent publications have con- 
firmed previous findings that ompA is not a reliable marker of 
phylogeny (Millman et al. 2001; Gomes et al. 2004; Brunelle and 
Sensabaugh 2006), due to extensive recombination within the 
genomes of C. trachomatis strains (Jeffrey et al. 2010; Harris et al. 
2012; Joseph et al. 2012), and that to fully understand the pop- 
ulation structure and patterns of infection it is essential to deter- 
mine the whole-genome sequence. The genome of C. trachomatis 
comprises a chromosome of 1.0 Mb and a plasmid of 7.5 kb which 
have been found to be highly conserved between strains, with few 
indels and no variably present genomic islands identified to date 
(Stephens et al. 1998; Carlson et al. 2005; Thomson et al. 2008; 
Seth-Smith et al. 2009; Jeffrey et al. 2010; Unemo et al. 2010; 
Somboonna et al. 2011; Harris et al. 2012). The use of full-genome 
sequence data in hospital settings has been demonstrated in recent 
studies (Koser et al. 2012b; Snitkin et al. 2012) and promises to 
revolutionize epidemiology and clinical microbiology (for review, 
see Koser et al. 2012a). Obtaining these data rapidly is a new chal- 
lenge, particularly pertinent in the study of difficult-to-culture or 
fastidious bacteria. 

Until now, cell culture has been necessary to generate suf- 
ficient C. trachomatis DNA for genome sequencing (Stephens 
et al. 1998; Carlson et al. 2005; Thomson et al. 2008; Seth-Smith 
et al. 2009; Jeffrey et al. 2010; Unemo et al. 2010; Somboonna et al. 
201 1; Harris et al. 2012; Joseph et al. 2012). Using clinical samples 
as a starting material, several months of passaging is often re- 
quired. Not all strains of C. trachomatis culture equally well, with 
a subset of strains failing to grow in culture despite being detect- 
able by other assays (for review, see Ridgway and Taylor-Robinson 
1991). This implies that the growth of strains itself may impose 
a selective bias on the strains whose genomes we are able to 
sequence. It is clear that a rapid, simple, culture-independent 
technique for generating DNA for whole-genome sequencing is 
required. 

Whole-genome amplification (WGA), in particular the tech- 
nique of multiple displacement amplification (MDA), can be used 
on very low concentrations of starting material to generate the 
quantities of DNA required for sequencing and has been shown 
to provide complete genome coverage from bacterial DNA sam- 
ples (Rodrigue et al. 2009; Chaparro et al. 2011). MDA is an iso- 
thermal DNA amplification process using 029 polymerase and 
random hexamers with phosphorothioate modification, mean- 
ing that this technique can be used to amplify any DNA sample 
(Dean et al. 2001; Hosono et al. 2003). We aimed to determine 
whether it was possible to generate complete, accurate genome 
sequences of C. trachomatis from clinical samples with the use of 
MDA. To reduce the associated contamination and enrich for 
C. trachomatis, we combined MDA with immunomagnetic sepa- 
ration (IMS), an antibody-based approach for targeted enrich- 
ment of cells. IMS has been used to enrich for bacteria and 
remove PCR inhibitors in previous studies on genera including 
Listeria (Skjerve et al. 1990), Mycobacterium (Grant et al. 1998), 
Escherichia coli 0157:H7 (Fratamico et al. 1992), Actinobacillus 
(Angen et al. 2001), and also Chlamydia (Niesters et al. 1991; 
Hedrum et al. 1992). Using this combined approach (IMS-MDA) 
we were able to generate whole-genome sequences of C. trachomatis 
directly from discarded clinical swabs, without the need for cell 
culture. 



Results 

Accurate genome sequences from amplified C. trachomatis DNA 

To assess the sensitivity and accuracy of WGA of C. trachomatis 
DNA using MDA, a range of dilutions of C. trachomatis serovar L2 
genomic DNA was used as substrate for amplification. DNA was 
extracted from a single well of a 24-well tissue culture tray (24WT) 
and, as a pre-dilution control, sequenced on an Illumina GAII 
machine following standard protocols (see Methods) using 11 
indexed sequence adapters, or tags, per lane. The resulting data 
were assembled using Velvet and manual improvement to generate 
a reference sequence comprising two contigs, with an unassem- 
bled gap in the tarp (translocated actin recruiting phosphoprotein) 
gene, which contains a repetitive motif. 

The same L2 genomic DNA was then subjected to a series 
of dilutions, and MDA reactions were performed on 1 (jlL of each 
dilution. The number of genome copies was determined using 
quantitative PCR (qPCR) with a TaqMan probe targeting the single- 
copy chromosomal ompA gene. Samples were sequenced on an 
Illumina HiSeq machine using nine tags per lane. To assess the 
coverage and accuracy of the resulting data, the sequencing reads 
were mapped in silico to the assembled reference L2 genome se- 
quence, with the control reads self-mapped for comparison, and 
the variation between the reference and the sequenced samples 
was determined (Table 1). 

The results show that complete coverage of the genome was 
achieved from all samples for which the starting material com- 
prised at least 4800 input genome copies, giving 1,500,000 post- 
MDA copies. Below this, there was a dramatic drop-off in the 
number of post-MDA genome copies, resulting in a very low se- 
quencing yield and only 26% coverage of the genome from an 
input of 3500 genome copies. This provides an approximate lower 
limit for the number of chromosomal copies required to generate 
accurate genome sequencing using MDA. In all cases, the complete 
plasmid was covered in large depth by sequence reads. 

For the samples that mapped to the complete chromosome 
(from Dilutionl to Dilution7), mean chromosome coverage levels 
varied with yield, but in all cases were more than sufficient for 
accurate base calling. However, as the input DNA decreased, the 
variation in coverage level across the genome increased. This is 
illustrated by an increase in the coefficient of variation (CV = 
standard deviation/mean) of the coverage as the number of input 
genome copies decreased. These factors underline the need to use 
sequencing technologies that yield high numbers of reads per 
sample in order to generate sufficient depth of coverage to ensure 
accurate sequence generation. Coverage of the plasmid in the 
amplified samples was far greater than that of the chromosome, 
ranging from 79 to 213 times the chromosomal coverage (control = 
10.5 times). This is due to preferential amplification of the small, 
circular plasmid, and means that it is not possible to estimate the 
plasmid copy number per chromosome from amplified samples. 

The accuracy of the sequencing was determined through 
analysis of single nucleotide polymorphisms (SNPs), insertions, de- 
letions, and heterogeneous sites after mapping against the reference 
assembly. All samples which mapped to 100% of the chromosome 
provided completely accurate genome sequences (chromosome 
and plasmid) with the exception of the sequence data derived from 
4800 genome copies, for which eight chromosomal SNPs were 
identified (representing >99.99% accuracy). No insertions or de- 
letions were identified in any of the samples giving 100% genome 
coverage, and only a single base insertion was identified in the data 
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derived from Dilution8 with 3500 genome copies, although these 
data were not complete across the whole genome, with the in- 
sertion identified in the plasmid sequence. Heterogeneous sites are 
defined here as base positions at which clear base calling cannot be 
performed due to a lack of consensus in the mapped reads. Our 
analysis of these showed that lower levels of pre-MDA input DNA 
caused higher levels of heterogeneity, and also led to the incorrect 
base calls (SNPs) seen in the Dilution7 sample containing 4800 
genome copies. These inaccuracies are located in regions of high 
read coverage with the variant base present in >75% of reads in all 
cases, indicating that an incorrect base may have been incorpo- 
rated during amplification of these sites early in the MDA process. 

Generation of completely accurate genome sequences can be 
achieved using MDA. An initial input of between 4800 and 23,500 
genome copies is required, equivalent to between 1,500,000 and 
48,000,000 copies post-MDA. These data also show that complete 
genome sequences can be derived from DNA extracted from a 
single well of a 24WT with a sufficient coverage (21 7 x) to allow 
high quality de novo assembly. This observation greatly reduces 
the need for extended laboratory passage of strains prior to DNA 
extraction for genome sequencing. 

Assessing the C. trachomatis content of clinical samples 

To determine the feasibility of amplifying genomes directly from 
clinical samples using MDA we determined the DNA composition 



of C. trachomatis positive clinical samples. Addenbrooke's Hospital 
in Cambridge and Peterborough City Hospital provided samples 
from their routine diagnostic service. These included swabs and 
urines from males and females, with the diagnostic sample taken 
directly into lysis buffer for use with the Gen-probe Aptima or 
Abbott m2000 systems. DNA was extracted from these buffers us- 
ing the Qiagen QIAmp DNA mini kit or Promega Wizard genomic 
DNA purification kit and analyzed for total DNA content and 
C. trachomatis DNA content. 

Quantitation showed highly variable total DNA content in 
the samples from below the detection limit of 100 pg/n-L to a 
maximum of 161 ng/|j.L (Table 2). C. trachomatis DNA was present 
at very low levels, often below the detection limit of 1000 ge- 
nome copies/(iX, comprising a maximum of 0.6% of the total 
DNA. This indicates that the samples include a large amount of 
non-chlamydial DNA, from the host or other resident microbiota. 

We performed MDA directly on these DNA extracts (1 jjlL al- 
iquot) to assess the potential for direct amplification of C. tracho- 
matis genomes from a clinical sample. Although an increase in 
total DNA was seen in most samples and the number of C. tra- 
chomatis genome copies increased in some cases, the relative pro- 
portion of C. trachomatis DNA fell in many instances (Table 2), 
suggesting preferential amplification of the other DNA species 
within these complex samples. To test this phenomenon of a 
reduction in the proportion of C. trachomatis DNA after MDA, we 
performed an experiment with C. trachomatis DNA spiked into 



Table 2. Analysis of DNA extracts from clinical samples 



DNA concentration Ct genome Initial % Post-MDA DNA Post-MDA Ct genome Post-MDA % 

Sample Source (pg/ n-L) copies/jiL Ct DNA concentration (pg/jjiL) copies/ [iL Ct DNA 



GQ1 


Urine 


252 


<1 000 


ND 


62.6 a 


<1000 


ND 


GQ2 


Urine 


166 


<1 000 


ND 


49.6 a 


<1000 


ND 


GQ3 


Vaginal 


<1 00 


<1 000 


ND 


63,800 


<1000 


ND 


GQ4 


Urine 


576 


<1 000 


ND 


70.8 a 


<1 000 


ND 


GQ5 


Cervical 


974 


<1 000 


ND 


336,000 


50,500 


0.015 


GQ6 


Cervical 


746 


<1 000 


ND 


40 a 


<1000 


ND 


GQ7 


Vaginal 


1560 


<1 000 


ND 


284,000 


<1000 


ND 


GQ8 


Cervical 


3280 


1900 


0.06 


39.2 a 


<1000 


ND 


GQ9 


Cervical 


1280 


<1 000 


ND 


34.4 a 


<1000 


ND 


GQ10 


Urine 


454 


<1 000 


ND 


50.8 a 


<1000 


ND 


GW1 


Urethral 


15,600 


81,800 


0.5 


452,000 


429,000 


0.09 


GW2 


Vaginal 


1 39,000 


87,500 


0.6 


528,000 


13,100 


0.002 


GW3 


Vaginal 


<100 


<1 000 


ND 


3400 


<1000 


ND 


GW4 


Urethral 


17,700 


<1 000 


ND 


496,000 


<1000 


ND 


GW5 


Vaginal 


161,000 


<1 000 


ND 


508,000 


<1 000 


ND 


GW6 


Urethral 


1580 


<1 000 


ND 


474,000 


<1000 


ND 


AQ1 


Urine 


<1 00 


<1 000 


ND 


2680 


<1000 


ND 


AQ2 


Urine 


118 


<1 000 


ND 


738,000 


<1000 


ND 


AQ3 


Urine 


1380 


<1 000 


ND 


640,000 


<1000 


ND 


AQ4 


Urine 


<1 00 


<1 000 


ND 


742,000 


<1000 


ND 


AQ5 


Urine 


<1 00 


<1 000 


ND 


30,200 


<1 000 


ND 


AQ6 


Cervical 


1330 


<1 000 


ND 


652,000 


7600 


0.001 


AQ7 


Urine 


<1 00 


<1 000 


ND 


648,000 


<1000 


ND 


AQ8 


Urine 


314 


<1 000 


ND 


734,000 


<1000 


ND 


AQ9 


Urine 


9860 


<1 000 


ND 


854,000 


<1000 


ND 


AQ10 


Urine 


106 


<1 000 


ND 


60,400 


<1000 


ND 


AW1 


Vag/Ure 


<100 


<1 000 


ND 


570,000 


<1000 


ND 


AW2 


Vag/Ure 


11,000 


1000 


0.009 


486,000 


5180 


0.001 


AW3 


Urine 


116 


<1 000 


ND 


604,000 


<1000 


ND 


AW4 


Vag/Ure 


71,600 


1050 


0.001 


440,000 


1720 


0.0004 


AW5 


Urine 


33,000 


<1 000 


ND 


666,000 


6540 


0.0001 


AW6 


Urine 


208 


<1 000 


ND 


121,000 


<1 000 


ND 



(Ct) C. trachomatis. (ND) Not determined. (GQ) Gen-probe samples extracted through Qiagen columns. (GW) Gen-probe samples extracted by Wizard. 
(AQ) Abbott samples extracted through Qiagen columns. (AW) Abbott samples extracted by Wizard. 
"Samples appear to have failed to amplify. 
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carrier DNA at a range of concentrations. This showed that the 
percentage reduction occurs in all cases when the target DNA is 
present over the tested range of 0. 1%-21% (Supplemental Table SI). 

A second set of clinical samples was obtained from 
Addenbrooke's Hospital for direct sequencing. These were also 
discarded C. trachomatis-positive Gen-probe Aptima samples, from 
which the DNA was extracted using the Qiagen minikit. Total DNA 
concentrations showed that several of the samples did not contain 
sufficient input DNA required for Illumina sequencing (minimum 
input 1 (xg). Of the samples which met this criterion, MDA was 
performed on 1 jjlL of the extracted DNA. Following quantification, 
both the pre- and post-amplification samples were sequenced. The 
resulting reads were mapped to a completed high-quality reference 
C. trachomatis chromosome (urogenital strain F/SW4, EMBL ac- 
cession HE601804), and the results are shown in Table 3. 

The results show that in all samples the proportion of reads 
matching regions within the human genome sequence was in 
excess of 92%. The depth of C. trachomatis coverage, as assessed by 
mapping to a reference genome, was very low and did not repre- 
sent coverage of the entire genome, making it impossible to ac- 
curately call sequence bases. Amplification of the sample did not 
consistently or substantially improve the results, and in most cases 
only made the sequencing more uneven across the genome as 
indicated by the increased CV values. While it is possible that 
deeper sequencing of these samples would yield sufficient data to 
generate accurate sequences, this would be prohibitively expensive 
and not scalable. These data indicate that an enrichment step is 
needed to increase the relative percentage of C. trachomatis target 
DNA in order to obtain accurate and cost effective C. trachomatis 
genome sequences directly from clinical samples. 

Developing IMS-MDA for C. trachomatis 

We investigated an affinity-based technique, IMS, for targeted 
enrichment of C. trachomatis from clinical samples. This technique 
is appropriate for swabs taken directly into transport medium 
such as Chlamydia transport medium (CTM) or viral transport 
medium (VTM), which maintain EBs intact. IMS uses antibodies 
associated with magnetic beads to bind intact Chlamydia, with 
wash steps designed to remove contaminating material, enabling 
enrichment of the target species. We used a commercially avail- 
able anti-Chlamydia mouse IgG primary antibody (IMAGEN 



Chlamydia, Oxoid) against C. trachomatis lipopolysaccharide (LPS), 
which binds to all serovars of C. trachomatis and has been tested for 
cross-reactivity against many other microbial species including 
Lactobacillus lactis, Mycoplasma spp., Neisseria gonorrhoeae, and 
Gardnerella vaginalis (IMAGEN Chlamydia booklet; Thornley et al. 
1983, 1985). LPS is present at -34,000 molecules per EB (Su et al. 
1990), creating a high density target for antibody binding. This 
antibody was used with an anti-mouse IgG sheep secondary anti- 
body conjugated to magnetic beads (Dynabead, Dynal, Invitrogen), 
using the manufacturer's protocol (Dynal). Briefly, the primary 
and secondary antibodies were first allowed to bind to each other, 
before the test sample was added and a series of washes was per- 
formed. MDA was performed directly after IMS, with the first 
incubation at 95°C used for bacterial lysis and denaturation of the 
genomic DNA. 

To determine the efficacy of this approach, IMS-MDA was 
performed on a serial dilution of urogenital C. trachomatis strain 
D/314 from culture, diluted in CTM (Remel M4RT). The number of 
infectious particles in the input sample was estimated by infection 
of McCoy cells, and the total number of genome copies present in 
the input sample was determined by qPCR as an approximation of 
the number of Chlamydia within each sample. Perhaps un- 
surprisingly there was considerable disparity between these values, 
with the sample (100 (jlL) representing 10~ 3 dilution containing 
—96 infectious particles and —10,000,000 genome copies. This 
disparity is likely to reflect the inefficiency with which C. tracho- 
matis infects McCoy cells and the presence of dead or noninfec- 
tious Chlamydia that register as genome copies. 

Genome copies were assayed after the IMS procedure and af- 
ter IMS-MDA. To compare with the values above, the number of 
genome copies in the 10~ 3 dilution sample after IMS was 3,805 ± 
248, indicating the number of intact, DNA-containing bacteria 
recovered by the antibody binding. MDA generated sufficient DNA 
for sequencing (>5, 000,000 copies/jiT) from samples at dilutions 
10 _1 to 10~ 5 (Fig. 1), even though the number of genome copies 
recovered from the 10~ 5 sample after IMS was below the detection 
limit. Protocol variations, including use of an alternative primary 
antibody against MOMP, had no effect on the efficiency of this 
protocol (data not shown). 

This experiment indicates that IMS-MDA is successful at 
recovering C. trachomatis from CTM, and could thus potentially 
generate DNA for genome sequencing directly from clinical 



Table 3. Analysis of sequencing data from DNA extracts from clinical samples, with and without MDA 



DNA concentration Ct genome Total yield Reads matching Chromosome Depth of Depth of 

Sample Amplified (pg/|xL) copies/jj.L for tag (kb) Ct genome (%) coverage (%) coverage (mean) coverage (std) 



GA3 


N 


1300 


1300 


1,698,872 


1.1 


49.4 


0.9 


11.7 




Y 


8000 


<1000 


1,834,073 


0.8 


20.7 


0.3 


3.3 


GA4 


N 


1300 


<1000 


1,697,929 


0.3 


44.7 


1.1 


14.0 




Y 


27,000 


<1000 


2,250,286 


0.1 


16.1 


0.6 


5.3 


GA5 


N 


1900 


2300 


1,422,125 


2.8 


80.0 


2.1 


13.3 




Y 


24,000 


<1000 


1,610,920 


0.3 


8.3 


0.1 


2.6 


GA9 


N 


8200 


79,000 


1,812,022 


3.0 


94.5 


3.5 


10.7 




Y 


37,000 


10,000 


1,771,792 


5.2 


80.6 


1.9 


3.6 


GA10 


N 


1000 


<1000 


NS 












Y 


6000 


<1000 


1,742,050 


0.1 


7.0 


0.1 


3.2 


GA11 


N 


3700 


2500 


1,957,726 


0.1 


9.2 


0.3 


17.0 




Y 


55,000 


<1000 


1,811,183 


0.2 


9.8 


0.2 


3.1 


GA12 


N 


3500 


<1000 


1,115,405 


0.1 


7.0 


0.2 


6.0 




Y 


<100 


<1000 


1,518,803 


4.7 


78.2 


1.8 


3.6 



All samples were run on Illumina HiSeq with a read length of 100 bp and 24 samples per lane. Sample sources are unknown. (Ct) C. trachomatis. 
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Figure 1. Recovery of C. trachomatis DNA following IMS and IMS-MDA 
on a serial dilution of C. trachomatis EBs. Post-IMS values are indicated by 
the gray diamonds and represent total genome copies recovered by IMS. 
The DNA recovered from the -5 dilution sample was below the detection 
limit of 1000 genome copies/microliter. Post-IMS-MDA values are in- 
dicated by black squares and represent genome copies/microliter. Error 
bars indicate standard deviation from duplicate experiments. The dotted 
gray line indicates the cut-off load, below which IMS-MDA produces in- 
sufficient DNA for sequencing. 



samples. The success of the approach appears to be dependent on 
the initial load of C. trachomatis present in the sample. 

Complete genome sequences directly from clinical samples 

To test whether IMS-MDA can be applied to clinical samples 
containing high levels of human cells and other microbiota, we 
applied the above protocol to discarded routine C. trachomatis- 
positive swab samples. Eighteen samples in CTM (Remel M4RT), 
all diagnosed positive by qPCR (Jalal et al. 2006), were obtained 
from Addenbrooke's Hospital in two batches. All samples were 
processed by IMS-MDA and quantified by qPCR to estimate the 
output number of genome copies. From the first batch (seven sam- 
ples; swabl-3, 5-8), all samples were sequenced, following standard 
multiplex Illumina sample preparation and sequencing protocols. 
For the subsequent batch (11 samples; swabBl-11), four samples 
with the highest amount of C. trachomatis DNA were sequenced. 
For the analysis of the resultant sequence data all reads were 
mapped against the complete genome of STI strain F/SW4 (EMBL 
accession HE601804). 

The mean depth of coverage was highly variable between 
samples, with samples containing fewer C. trachomatis genome 
copies post IMS-MDA providing less C. trachomatis genome data 
(Table 4). Of the five samples generating reads covering >99% of 
the chromosome, all showed a mean depth of coverage of >38x 
(Table 4), allowing confident base calling. These samples all con- 
tained >10,000,000 genome copies/jjiL post-IMS-MDA, whereas 
swab7 and swabBll produced 5000 and 300,000 genome copies/ 
(jlL post-IMS-MDA, respectively, which was found to be insufficient 
for production of a complete genome. In agreement with the pre- 
vious data, it seems that an excess of 1,500,000 copies/pX is re- 
quired to successfully generate genomic data. Use of alternative 
primary antibodies did not improve recovery of C. trachomatis 
DNA after IMS-MDA (data not shown). The diagnostic qPCR CT 
(cycle threshold) value was not found to be indicative of IMS-MDA 



success, indicating that this clinical assessment of chlamydial load 
is not relevant in this context. IMS-MDA is dependent on the 
presence of intact Chlamydia whereas CT value reflects the total 
amount of chlamydial DNA present in the sample. The integrity of 
the samples in this respect may depend on sample type, patient-to- 
patient variation, how the sample was collected, and how the 
sample was stored in the clinic. 

From 18 clinical samples, five complete genome sequences 
were generated. These data indicate that IMS-MDA can be used 
on clinical samples to produce a reliable genome sequence. These 
genomes represent the first bacterial genomes produced directly 
from clinical samples, without culture. 

Assembly of novel genome sequences generated by IMS-MDA 

Sequence data generated using MDA has a more uneven depth of 
coverage than that generated from cultured samples, as seen 
above, through semi-random over-amplification of regions of the 
genome (Pinard et al. 2006). This is reflected in chromosome CV 
values (Table 4), as defined above, and can be visualized by map- 
ping read data from a cultured control strain (Table 1, top row) 
against a reference genome of C. trachomatis and comparing this 
with mapped read data from the five successfully sequenced clin- 
ical samples above (Supplemental Fig. SI). 

While genome variation can be accurately determined by 
mapping against an appropriate reference and calling SNPs, un- 
even genomic coverage is potentially problematic for de novo ge- 
nome assembly as it violates the assumptions of many assembly 
algorithms. Assembly of data sequenced following IMS-MDA of 
clinical samples is further complicated by the potential presence of 
low-level contaminant DNA either from the sample or from con- 
taminated reagents (Blainey and Quake 2010). 

With the recent rapid advances in the fields of single cell 
sequencing (for review, see Lasken 2012), it is becoming more 
commonplace to assemble data produced by the sequencing of 
MDA-amplified DNA. This has led to the development of infor- 
matics techniques that allow improved assembly of such sequence 
data with uneven coverage, including the SPAdes assembly soft- 
ware (Bankevich et al. 2012). We applied the single cell option of 
the SPAdes assembler to produce de novo assemblies of the five 
clinical samples sequenced after IMS-MDA (above). For compari- 
son we also assembled the above data from the cultured control L2 
strain using the same program, but without the single cell option 
(Table 5). 

IMS enriches samples for C. trachomatis but does not produce 
pure samples of the target bacterium, which is apparent within the 
assembly as a number of small contigs of contaminant species were 
derived from all five clinical samples. These include contigs with 
similarity to Lactobacillus and Gardnerella spp., two common com- 
mensals of the vagina (Ravel et al. 2011), and Ralstonia spp., a 
known kit contaminant (Kulakov et al. 2002; Pride et al. 2012). The 
high level of conservation of the C. trachomatis genome across the 
species allows these contaminant contigs to be easily identified 
and removed; we achieved this using abacas (Assefa et al. 2009) 
to order the contigs against a reference C. trachomatis genome 
(F/SW4). 

The assembly statistics for the control and clinical samples 
are shown in Table 5. Assembly of the sequence data from the 
cultured control sample produced two chromosomal contigs to- 
taling 1,032,628 bp. The only contig breaks coincided with the 
two identical rRNA operons, which collapse into one contig during 
assembly of short read data. Despite the uneven coverage in the 
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Table 5. Assembly of IMS-MDA clinical swab sequence data 

Sample Control L2 Swab5 Swab6 SwabBI SwabB4 SwabB8 a 



SPAdes single cell option used 


N 


Y 


Y 


Y 


Y 


Y 


Total length 


1,396,321 


2,050,692 


1,170,823 


3,994,769 


8,644,732 


38,233,648 


Number of contigs 


4597 


2002 


282 


6419 


13,597 


48,345 


Total length 


1,032,628 


1,042,675 


1,037,506 


1,044,130 


1,036,854 


1,037,223 


Number of contigs 


2 


2 


2 


9 


21 


5 


Mean length 


516,314 


521,338 


518,753 


116,014 


49,374 


207,445 


Std dev lengths 


172,783 


499,242 


496,657 


1 66,290 


55,143 


206,570 


Max length 


689,097 


1,020,579 


1,015,410 


453,473 


192,642 


539,668 


Min length 


343,531 


22,096 


22,096 


198 


248 


284 


N50 


689,097 


1,020,579 


1,015,410 


326,21 1 


83,787 


539,668 



Before contamination removal 

Contigs matching the C. trachomatis 
chromosome 



a One chromosomal contig contained a misassembly forming a large inverted repeat which was manually corrected. 



clinical samples, assemblies of these samples produced between 
two and 21 contigs covering the chromosomes. Total assembly 
lengths, following the removal of contaminant contigs, were 
between 1,032,628 bp and 1,044,130 bp, consistent with the 
expected size of the C. trachomatis chromosome. The most frag- 
mented chromosome assembly was produced from the swabB4 
data, which assembled into 21 contigs of between 248 and 192,642 
bp. This sample had the lowest mean depth of coverage of the five 
clinical samples, and the highest chromosome CV (0.63; Table 4), 
indicating that these data have the most variable coverage within 
this data set. In all cases the plasmid was present in the assembly, 
although often represented as a number of contigs, due to the 
extremely high plasmid coverage resulting from MDA (data not 
shown). Following assembly it was possible to derive genotypes for 
these five strains, which indicated that swab6 and swabB5 have 
ompA genotype E and swab5, swabBl, and swabB8 have ompA 
genotype F (Table 4). These strains are most closely related to the 
sequenced strains E/SotonE4 (EMBL accession HE601802) and 
F/SW5 (EMBL accession HE601805), respectively. 

Additional applications of IMS-MDA 

The full IMS-MDA protocol can be performed on samples within 
5 h with very little hands-on time. Therefore, this technique can be 
used on samples from cell culture to generate DNA for sequencing 
as an alternative to the standard protocol, which involves differ- 
ential centrifugation to remove host cell debris and concentrate 
the Chlamydia followed by genomic DNA extraction and resus- 
pension. We tested both protocols in parallel on eight strains of 
C. trachomatis from urogenital clinical samples. These were grown 
in tissue culture to infect five wells of a 24WT, with two wells from 
each pooled to undergo IMS-MDA, and three wells pooled for ge- 
nomic DNA extraction, prior to Illumina sequencing and mapping 
to a reference genome. 

We found that all samples produced 100% genome coverage, 
with the extracted samples producing a minimum 37.9X mean 
depth of coverage (Supplemental Table S2). Comparison of the 
sequenced IMS-MDA samples found no SNPs between samples of 
the same strains, even down to a mean depth of coverage of 14. 4x 
post IMS-MDA. The sequence data were used to determine geno- 
type (Supplemental Table S2), with these samples characterized as 
ompA genotype E (four samples), G (three samples), and K (one 
sample). Again, heterogeneous sites were analyzed and were found 
to be relatively similar across both sampling methods of the same 
strain. Strain R3059 showed an increased level of heterogeneity 
after IMS-MDA, which may be attributed to the low output level of 
C. trachomatis genome copies, indicating a low input level of DNA 



pre-MDA as seen above. One sample (R33512) showed very high 
levels of heterogeneity across the genome (>4500 sites) in sequence 
data after both extraction and IMS-MDA, seemingly indicative of 
a mixed infection. In this sample, a major variant appears to be 
present in —85% of the reads, and a minor variant in —15% of the 
reads. While mapping or assembly will give information on the 
dominant strain present, we can also resolve mixed infections 
bioinformatically when there is sufficient bias between the strains. 
In this case, it was possible to separate out the variants based on 
their relative frequency (data not shown), which indicates that the 
two strains fall in the different trachoma clades T2 (ompA genotype 
K) and Tl (ompA genotype E), which are separated by 4860 SNPs 
(Harris et al. 2012). Together, these data indicate that IMS-MDA is 
a rapid way of generating genomic DNA which gives accurate and 
interpretable genome sequence data. 

IMS-MDA can also be used on C. trachomatis samples that 
have been archived at -80°C: both original clinical samples in 
CTM that have been stored since diagnosis and passaged samples 
that have been archived and are no longer viable. We have ob- 
tained complete genome sequences from seven LGV strains of 
C. trachomatis, which had been stored as diagnostic samples for up 
to 8 yr (47 samples assayed by IMS-MDA), and three LGV strains, 
which had been passaged one to five times prior to storage, up to 
16 yr previously (15 samples assayed by IMS-MDA; Supplemental 
Table S3). These strains were determined to be ompA genotypes 
LI and L2b, indicating the wide applicability of this technique 
through the use of a characterized, diagnostic antibody. While the 
viability of the latter samples was not tested, the former clinical 
samples were known to be nonviable through treatment at 56°C 
for viral inactivation prior to NAATs diagnosis. The IMS-MDA pro- 
tocol gave rapid access to the genomic information within these 
archived samples, providing data from otherwise inaccessible 
samples. 

Practical considerations for the use of IMS-MDA 

IMS-MDA has potential for use in non-laboratory settings, as the 
IMS reagents are stable, only simple equipment is necessary, and 
the incubations can be carried out within the temperature range of 
4°C and 30°C (data not shown). The MDA component, however, 
requires storage at -80°C and more complex and precise incu- 
bations, preferably using a PCR machine. By performing IMS on 
replicate samples and storing the resulting washed beads at a range 
of temperatures for 7 and 14 d prior to MDA, we determined that 
the C. trachomatis DNA is stable post-IMS over these time periods. 
Recovery of sufficient DNA for sequencing was possible after stor- 
age at 20°C for 14 d, although maximum recovery was achieved 
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after storage at -20°C (Supplemental Fig. S2). Additionally, for 
high-throughput sample processing, IMS-MDA can be carried out 
in 96-well format with equivalent yield (data not shown). The cost 
of IMS-MDA reagents works out at approximately US$5 per sample 
with low up-front equipment costs. 

We also determined whether C. trachomatis remains viable 
after being subjected to the IMS protocol. Infection onto a McCoy 
cell monolayer was attempted using post-IMS bead-bound C. tra- 
chomatis, and samples eluted from the beads after IMS. While vi- 
ability was greatly reduced in both these samples compared with 
the control sample pre-IMS, inclusions were seen under both con- 
ditions (Supplemental Fig. S3), indicating that some infectious 
ability remains. 

Discussion 

Current diagnostic techniques for many bacterial pathogens pro- 
vide information on the presence or absence of a target species, 
occasionally generating limited subtyping information. While this 
is generally sufficient for patient treatment and basic epidemi- 
ology, it provides little detailed resolution relating to the causal 
agent. In order to study evolution and epidemics, investigate out- 
breaks, trace sources, and potentially track the spread of emergent 
drug resistant strains, greater resolution is required. For maximum 
phylogenetic resolution and strain identification, it is necessary to 
have the full genomic sequence (Harris et al. 2012). In the post- 
genomic age of clinical medicine, it is becoming possible as well 
as desirable to derive the maximum level of information from in- 
fectious agents (Gardy et al. 2011; Eyre et al. 2012; Koser et al. 
2012b; Snitkin et al. 2012). While many clinically relevant bacteria 
are routinely cultured, some pathogens are difficult and time con- 
suming to grow in vitro, and many environmental bacteria remain 
uncultured. Methods able to generate genomic information while 
circumventing the need for culture have great benefits in the ap- 
plicability to all bacteria, and in the speed of clinical response. We 
have developed a rapid, culture-independent method for gener- 
ating DNA for genome sequencing from a targeted species, direct 
from clinical samples. We have shown IMS-MDA to be an accurate, 
high-throughput, inexpensive, and low-tech methodology with 
high potential for transfer to other bacteria. 

C. trachomatis, with its small genome and obligate intracellular 
developmental cycle, is an excellent model with which to vali- 
date this approach. With appropriate ethical consent, discarded 
clinical samples can be used for research purposes post-diagnosis. 
These samples contain very low levels of C. trachomatis and high 
levels of other material including human cells as well as other 
microbiota inhabiting the same body site. We have shown that it 
is possible to generate sufficient DNA for accurate whole-genome 
sequencing by combining targeted enrichment using IMS and 
WGA using MDA. In addition we have shown that IMS-MDA can 
be put to alternative uses including the rapid generation of whole 
genomes from limited culture volumes and from irreplaceable 
archival samples. While this approach does not generate data 
from every sample, it currently provides a respectable success rate 
of 1 5%-30%, which in many cases will offer the only possible way 
to generate genome sequence from complex or delicate samples. 

This approach opens up new avenues of investigation for 
genomic research into difficult-to-culture and fastidious bacteria. 
Using alternative antibodies or aptamers, this technique has great 
potential for use with other organisms that may be present at low 
load in clinical and environmental samples which present culture 
challenges, or require specialist growth conditions such as a high 



containment level. We anticipate that this technique will be mod- 
ified for successful use with many other organisms. 

Methods 

C. trachomatis strains, samples, and cell culture 

Clinical samples and archived samples of C. trachomatis for anal- 
ysis and IMS were kindly provided by Addenbrooke's Hospital, 
Cambridge, UK; the WHO Collaborating Center for Gonorrhoea 
and Other STIs, Orebro, Sweden; the Health Protection Agency, 
Colindale, UK; and the Center for HIV and Sexually Transmitted 
Infections at the National Institute for Communicable Diseases, 
Johannesburg, South Africa. Strain D/3 14 is from a cervical sample, 
provided by Dr. Harry Mallinson Ph.D., Consultant Clinical Sci- 
entist, University Hospital Aintree NHS Foundation Trust, Liver- 
pool, and strain C/TW3 was obtained from American Type Culture 
Collection (ATCC VR-1477). 

The study was approved by the National Research Ethics 
Service Committee East of England, Cambridge South (REC refer- 
ence: ll/EE/0166). All discarded clinical samples used in this re- 
search underwent bead beating or treatment with Triton (Sigma- 
Aldrich) to 1% to lyse any human cells present, on arrival at the 
Wellcome Trust Sanger Institute, Cambridge, UK. Any human se- 
quence data produced were removed within the sequencing sys- 
tem; therefore, no analysis or archiving of human sequence data 
was carried out. Thus there was no requirement for ethical ap- 
proval for the use of extracted DNA even though the sample 
originally contained human DNA, due to the fact that human se- 
quence data were removed prior to genetic analysis. All the work 
done in this study fell outside the requirements of the Human 
Tissue Act as it applies in England, Wales, and Northern Ireland. 

Cell culture was performed using McCoy cells as described 
(Seth-Smith et al. 2009), in culture volumes of 1 ml in 24-well 
tissue culture trays (24WT) and 10 mL in T75 tissue culture flasks. 

DNA extraction, amplification, and quantification 

Genomic DNA was prepared as previously described (Seth-Smith 
et al. 2009). WGA was performed using the illustra Genomiphi V2 
kit (GE Healthcare), using 1 |j.L input DNA as per the manufac- 
turer's instructions. Quantification of total DNA was performed 
using a Qubit fluorometer (Life Technologies), with broad range or 
high specificity reagents as appropriate. To determine the level of 
C. trachomatis genomic DNA present, a Taqman qPCR method 
targeting the single-copy chromosomal ompA gene was used (Jalal 
et al. 2006), performed on a StepOne Plus real-time PCR system 
(Applied Biosystems) in 96-well format. Taqman Fast Advanced 
reagents (Applied Biosystems) were used according to the manu- 
facturer's instructions with 1 |j.L input DNA in a total reaction 
volume of 20 \lL. Samples were heated to 50°C for 2 min, then 
95°C for 20 sec, followed by 40 cycles of 95°C for 1 sec and 60°C for 
20 sec. Standards were created using a serially diluted PCR product 
covering the assay region (f: 5 ' -CGGAATTGTGCATTTACGTG-3 ' ; r: 
5'-CTACGCTGAGGACGGTAAGC-3'), quantified by Qubit as 
above. Readings were analyzed as genome copies per microliter. 
Prior to this, qPCR used Fast Sybrgreen master mix (Applied Bio- 
systems) and the primers HJ-Plasmid-1: 5 '-AACCAAGGTCGATG 
TGATAG-3' and HJ-Plasmid-2: 5 ' -TCAGATAATTGGCGATTCTT-3 ' 
(Jalal et al. 2006). 

Sample sequencing 

When sequencing samples following MDA, the volume of the MDA 
samples remaining after DNA quantification, and C. trachomatis 



Genome Research 863 

www.genome.org 



Seth-Smith et al. 



DNA quantification (18 (jlL) underwent Illumina sequencing. All 
samples were sequenced on GAII or HiSeq machines, paired-end, 
with read lengths of 54, 75, or 100 bp. 

Sequence data analysis 

For clinical samples, reads mapping to human DNA were enu- 
merated and then removed by an automated pipeline prior to data 
release to researchers and public archives. Human data are de- 
termined as any sequence template for which either the forward or 
reverse read aligns to the phase one 1000 Genome Human refer- 
ence sequence (human_glk_v37) using Burrows-Wheeler Align- 
ment (Li and Durbin 2009) run with the -ql5 parameter. The 
sequence data were submitted to ENA with the accession numbers 
given in Supplemental Table S4. 

Mapping of the resulting data against a relevant reference 
C. trachomatis genome was then carried out using SMALT (http:// 
www.sanger.ac.uk/resources/software/smalt/) as previously de- 
scribed (Harris et al. 2010). As the rRNA operon is present in two 
copies in the C. trachomatis genome, reads will not map uniquely 
to these regions; thus, to accurately calculate the proportion of the 
reference covered by sequence reads, we employed the SMALT 
option to randomly map reads that have two or more best map- 
pings with the same alignment score. Assignment to ompA geno- 
types was performed by mapping reads against a database of known 
ompA types. 

De novo assembly was carried out using the SPAdes genome 
assembler (Bankevich et al. 2012). For the control sample, for 
which DNA was extracted from a cultured sample, the assembly 
was carried out without the single cell option. For clinical samples 
the single cell option was used to account for the uneven coverage 
distribution of reads relative to the genome, caused by MDA. To 
remove contigs representing contamination in the assemblies, 
contigs were aligned to a reference C. trachomatis genome (F/SW4 
EMBL Accession HE601804) using ABACAS (Assefa et al. 2009). A 
single misassembly was identified in the assembly of swabB8, and 
corrected manually. This misassembly formed a large inverted re- 
peat of an entire contig >300 kb, for which there was no supporting 
evidence in the raw data. The cause of this misassembly is being 
investigated for a bug fix by the authors of SPAdes. 

Analysis of clinical samples 

Discarded positive clinical samples, taken into either Gen-probe 
Aptima (Hologic Gen-Probe) or Abbott multi-collect (Abbott), lysis 
buffers were obtained from Addenbrooke's Hospital and Peterborough 
City Hospital. For initial analysis, buffer exchange was either per- 
formed using Qiagen DNA blood mini kit (Qiagen), where the 
elution volume was half that of the input volume leading to 
a doubling in DNA concentration, or prepared using the Wizard 
genomic DNA purification kit (Promega), where the DNA was 
resuspended in a small volume to create a 12.5 x concentration 
compared with the clinical sample. One microliter of these sam- 
ples were used in DNA quantification or amplification as above. 

MDA was performed on a series of samples in which C. tra- 
chomatis genomic DNA was spiked at varied concentrations into 
Erwinia carotovora carrier genomic DNA. The accurate input (0.4%- 
21% C. trachomatis DNA) and post-MDA DNA quantities were 
calculated using Qubit and qPCR quantification, as described 
above. 

Prior to sequencing, a second set of Genprobe Aptima samples 
was obtained. From these, one aliquot of 500 (jlL was used in an 
ethanol precipitation and resuspended in 50 (jlL autoclaved puri- 
fied water, and a second aliquot of 400 jjlL was used with the 
Qiagen DNA blood mini kit with elution in 100 \iL autoclaved 



purified water. These were analyzed or amplified as above, and the 
remaining volumes were sequenced as above. 

Immunomagnetic separation and MDA 

Where clinical, cultured, and archived samples underwent 
IMS-MDA, samples were harvested and bead-beaten prior to 
IMS, as per the DNA extraction protocol. Clinical samples from 
Addenbrooke's Hospital were supplied in Remel Microtest M4RT 
(Thermo Fisher), those from the HPA were supplied in a range of 
collection buffers including Remel M4RT. Where clinical samples 
were used, samples were routinely heated to 56°C for 15 min prior 
to use for viral inactivation. The magnetic beads (3 |xL per sample) 
(Dynabeads M-280 Sheep Anti-Mouse IgG, Invitrogen) were washed 
twice in isotonic PBS with 0.05% Tween20 (PBST) as per the man- 
ufacturer's instructions. Primary antibody (0.3 (jlL per sample of 
IMAGEN Chlamydia, Oxoid) was added to the Dynabeads, with 
10 volumes of PBST. The binding of primary antibody to beads was 
performed at 20°C shaking at 200 rpm for 1-20 h. Subsequently 
excess primary antibody was removed with two washes of the 
beads in PBST with the use of a Dynamag (Invitrogen), and the 
beads resuspended in PBST to 50 |j.L per sample. To each aliquot 
of 50 (jiL beads, between 20 and 200 (jlL of sample was added, 
depending on availability of the sample. Binding was performed at 
20°C shaking at 200 rpm for 1-20 h. Beads were washed twice with 
PBST, using the Dynamag, in order to remove any contamination 
present. After removal of the final buffer wash, Genomiphi am- 
plification was performed on the full amount of remaining mag- 
netic beads, without elution. Where samples were stored and 
shipped after IMS but prior to amplification, the dry beads were 
treated at 95°C for 5 min and then stored at -20°C and shipped 
with ice packs. 

Where variations to this standard protocol were used, they are 
indicated in Supplemental Table S5. 

Spike experiment details 

C. trachomatis strain D/314 (urogenital) was grown in tissue culture 
in a full 24WT for 3 d. These wells were harvested, pooled, bead- 
beaten, centrifuged at 1500 rpm for 5 min to remove the cell de- 
bris, and the final pellet resuspended in 200 \xL M4RT buffer 
(Remel, Thermo Fisher Scientific). A 10-fold serial dilution was 
performed in duplicate in M4RT buffer down to 10~ 7 . For each 
sample, 100 (jlL was used to infect a sub-confluent layer of McCoy 
cells in one well of a 24WT (cell numbers quantified through cell 
counting of trypsinized duplicate wells) to determine numbers of 
infectious Chlamydia, and 100 jjlL was extracted and quantified by 
qPCR. IMS was performed on two samples of 100 (jlL each, of which 
one was amplified immediately as above, and one was analyzed by 
qPCR. Additional replicates were incubated at 4°C for 48 h and 
then stored at -80°C for 2 wk to simulate possible sample treat- 
ment within a clinic. 

Stability experiment 

C. trachomatis strain D/314 (urogenital) was grown in tissue culture 
in a full 24WT for 3 d. As above, these wells were harvested, pooled, 
bead-beaten, centrifuged to remove the cell debris, and the final 
pellet resuspended in 2SP buffer. Samples were diluted to 10~ 2 and 
10~ 4 dilutions. Aliquots of 100 \lL of these dilutions were inoc- 
ulated into a sub-confluent layer of McCoy cells (cell numbers 
quantified through cell counting of trypsinized duplicate wells) to 
determine numbers of infectious Chlamydia. Further aliquots of 
100 (jlL were subjected to the standard IMS protocol in replicates of 
28. Two were immediately frozen at -80°C for subsequent qPCR 
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analysis of recovered number of genome copies. Two were imme- 
diately amplified and the resulting DNA was frozen at -80°C for 
subsequent qPCR analysis. The remaining aliquots were stored at 
4°C, -20°C, and room temperature (~20°C) for 7 and 14 d, after 
which samples were treated as above in duplicates. 

Growth in tissue culture after IMS 

Strains of C. trachomatis D/314 (urogenital) and ATCC strain 
C/TW3 (ocular) were grown in tissue culture as above and un- 
derwent IMS in duplicate. After the final two washes in PBST, one 
replicate was used for direct infection into one well of a 24WT 
using centrifugation. The second replicate for each strain was 
eluted from the magnetic beads using 100 u,L 0.1 M citrate pH3, 
with the eluate and the beads used for infection as above. Positive 
controls for each strain and negative controls were performed. 
Infections were followed by phase contrast microscopy over 3 d. 
Images were captured using a Zeiss AxioObserver Al system. 

Data access 

The sequence data were submitted to the European Nucleotide 
Archive (ENA) (http://www.ebi.ac.uk/ena/) with the accession 
numbers given in Supplemental Table S4. 
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